The Polygraph Place

Thanks for stopping by our bulletin board.
Please take just a moment to register so you can post your own questions
and reply to topics. It is free and takes only a minute to register. Just click on the register link


  Polygraph Place Bulletin Board
  Professional Issues - Private Forum for Examiners ONLY
  Cardio and EDA scoring question (Page 1)

Post New Topic  Post A Reply
profile | register | preferences | faq | search

This topic is 2 pages long:   1  2  next newest topic | next oldest topic
Author Topic:   Cardio and EDA scoring question
rnelson
Member
posted 09-18-2006 09:00 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
My question:

Do we score BP/BV and EDA amplitude changes from the question onset or from the lowest point following the question onset.

Here is an image of the problem
http://www.raymondnelson.us/qc/BP_EDA_question.jpg

You can see at R6 the EDA tracing descends before ascending. You can also see the cardio tracing descend before ascending. The chart divisions are ˝ per five seconds on X-axis or timescale, and Ľ inch on Y-axis or amplitude. The measurements are correct according to some instructions, and incorrect according to others.

Instructions from Dutton (2000) on Kircher measurements (Kircher and Raskin, 1988) for OSS scoring provides the following guidelines: 1) RLL for 10 seconds from question onset, 2) EDA amplitude of a phasic response that begins from .5 sec after the question onset to "about eight seconds after the question onset" (this is somewhat imprecise) and is scored to the peak of response (without a set window of time), and 3) blood volume amplitude increase from question onset to the end of the reaction or the end of the question window - the question window is described as extending to the greatest reaction amplitude before the next question. Dutton (2000) also describes visually estimating and tracing the average/mean (it's actually the median) between the diastolic and systolic tracing tips (max and min values for each cardio cycle), or alternately scoring the increase in amplitude at the plot of the diastolic (lower) cardio tips, and states "If the tracing drops downward at stimulus onset and never rises above the level at question onset, it is recorded as having 0 units of amplitude, since negative values are not used in OSS."

Consider this: If the cardio tracing drops downward at stimulus onset, and then increases to a level greater than the question onset, do we measure from the stimulus/question onset value, or the minimum value after the tracing drops downward? Dutton (2000) describes drawing a horizontal reference line from the value at stimulus onset, which gives the impression that the tracing is not measured from minimum value. However, this doesn't make sense to me. The (hypothetical) descending segment is a meaningless feature, indicative only of resporption in the skin (there are no parasympathic neurons in the skin). So we are now permitting resporbtion to deplete the measured value of the interpretable sympathetic or ascending (phasic) response (activated by sympathetic acetylcholine in the skin - not by sweat, just hook up your EDA leads and run your finger on the skin of your arm to produce a reaction that is not driven by sweat).

A review of Krapohl and McManus' (1999) description of the Kircher features, in their description of the OSS, similarly describe the respiration scoring window as 10 seconds following the question onset. However, they describe EDA reactions as phasic amplitude increases that begin between .5 seconds after the stimulus onset and 5 seconds after the point of answer, and "to a maximum of 20 seconds." I take the instructions to indicate the 20 second EDA window beginning at stimulus onset, with the requirement that the measured phasic reaction begins with .5 seconds after stimulus onset and 5 seconds after the answer. Krapohl and McManus (1999) assert that Kircher features have specific measurement windows, and stated “BV is the mean pulse wave, measured at stimulus onset until the presentation of the next question.” BV refers to Blood Volume, and I believe they clearly intend for the measurement to be the amplitude of increase to the maximum phasic response. Again though, it is not clear whether the onset value for measurement is the stimulus/question onset, or the minimum value if the tracing descends at tracing onset.

Based upon physiological parameters, I'm not sure I agree with using the stimulus onset value. While ascending cardio tracings are associated with sympathetic nervous system activation (driven by epinephrine and norepinephrine in the cardiovascular system), descending cardio tracings are indicative of parasympathetic cardiovascular activation (driven by acetylcholine – yes, parasympathetic acetylcholine, just look in your physiology textbook). So, scoring from stimulus onset, when the tracing descends before ascending, effectively allows parasympathetic activity to deplete the measured response amplitude of sympathetic nervous system reactions. It would make more sense physiologically, to me, to score from the minimum value to maximum value, within the scoring window - provided, of course, that the maximum value occurs after the minimum value on the x-axis or time-scale (kymograph, or whatever) – as this assures a positive slope for the measured segment.

Krapohl and McManus (1999) described their use of the EXTRACT software tool, from Johns Hopkins University, to derive their data values. So, the answer to this question may rest with the folks at Johns Hopkins. Or perhaps someone on this forum knows the answer.

Not satisfied, I went to the local research library (at the university just two blocks from my home), to obtain a printout from microfiche (some interesting old technology) of the Kircher and Raskin (1988) publication that is cited so often. (Why hadn't I done that earlier?)

Kircher and Raskin (1988) described their procedures for feature extraction, and explained they retained EDA (skin conductance) data, sampled at 100ms, for 20 seconds beginning at the onset of each question. They further describe smoothing the response curve, using a step-wise averaging procedure, prior to analysis, after which they tested the curve for positive slope at every five samples (˝ sec.) and used changes in slope (from zero or negative to positive) to identify the low points of the SC/EDA response curve.

This is interesting, because the units of measurement are unspecified or arbitratry (or vaguely specified at best) as with most current polygraph systems. (Limestone will report the tracing measurement as displayed for a the active screen display size or printed paper size, however this again is variable with the sensitivity setting and ratio assumptions about those measurements are not possible). Miritello (1999) handles this in a well-recognized and expedient manner, by assigning rank values to each measurement, then dividing by the number of possible rank measurement values – which has the effect of mathematically canceling out the ranks, and leaving a normalized proportion score with no units of measurement.

After identifying low points, Raskin and Kircher (1988) described that they then identified the high points of the SC/EDA response curve, after which they apparently isolated the high points of each segment between low points and then isolated the exact times and levels of the low points.

Kircher and Raskin (1988) described the use of Hg (mercury) strain gauges for thoracic and abdominal respiration (as opposed to our more common pneumatic bellows). (The mercury strain gauge, in presumably smaller form, is currently used in penile plethysmograph testing of sex offenders.) They describe that they retained respiration data for 20 seconds, beginning at stimulus onset, at 500ms (˝ sec) intervals, and applied a smoothing procedure to the four respiration cycles surrounding the answer.

Kircher and Raskin (1988) described their procedures for obtaining BV (blood volume) measurements, beginning with a 10ms sampling of cardiovascular activity for 20 seconds beginning at stimulus onset. They then tested each successive measurement for slope, and identified the diastolic and systolic measurement values for each second by second segment, and converted those measurements to to second by second averages.

Kircher and Raskin (1988) futher describe their development of FPA (finger pulse amplitude) and FBV (finger blood volume) data, obtained at 100ms intervals for 20 seconds beginning at stimulus onset, through a single Clairex CL703L CdSe photo-conductive cell and a miniature tungsten lamp activated by 3V. (I find this interesting in consideration of current discussion about the fingertip photoplethysmograph.) They describe subtracting the diastolic value from the systolic value, then dividing by the average of each diastolic and systolic value and subtracting the quotients from unity. (I'll have to find out what that means, but it sounds as if unity may be a foundational value derived from the entire dataset). The result of this is the FPA tracings rise with decreases in pulse amplitude and fall with increases in pulse amplitude (in opposition to the cardio tracing).

Their computer was a Terak 8510/8515 microcomputer, sporting a Digital Equipment Corp, LSI 11/02 processor and the RT-11 Version 3.B operating system. That system was reportedly equiped with 28K of memory and two (not one, but two) single sided eight-inch floppy disks. Wow. That's even more primitive than the DEC PDP 11/70-D systems, with RSTS/E operating systems, that I used in my day job as an undergraduate. Just as there is more computing power in the average automobile than the entire Apollo space program, there is more computing resources in our Palm Pilots and Blackberry devices than that computer system. That says a great deal about the thoughtful and efficient coding that used to occur – these days we waste computing resources like people who are looking forward to Bill Gates' next upgrade.

Kircher and Raskin (1988) reported that all of the times and levels of the high and low points of tracing measurement provided all the information necessary to quantify all of the physiological features which they investigated, with the exception of the RLL measurement (Timm, 1982). (I'll get the Timm article today.) They discussed their investigation of several physiological features, including:

1) Amplitude: which they measured in relative units obtained between each low point and each succeeding high point. They stated “ Amplitude was defined as the greatest such difference” (pg. 294).

That is my point, Kircher and Raskin measured not from stimulus onset, but from the maximum value between a low and high point.


Kircher and Raskin (1988) also investigated other features, including:

2) Rise Time: measured in seconds from the stimulus onset to maximum response amplitude
3) Half Recovery Time: time from maximum amplitude to the point at which the tonic response value was one-half that of maximum
4) Full Recovery Time: seems obvious
5) Duration to Half Recovery Time: measured from stimulus onset
6) Rise Rate: Amplitude divided by Rise Time
7) Half Recovery Rate: half of Amplitude divided by Half Recovery Time
8) Full Recovery Rate: Amplitude divided by Full Recovery Time
9) Area from Response Onset to Half Recovery: sum of differences obtained by subtracting the level at response onset from each subsequent measurement level to the point of Half Recovery (sounds a little like plotting an ROC curve)
10) Area from Response Onset to Full Recovery: same as above, only to full recovery
11) Electrodermal Burst Frequency: time between each low point in the SC/EDA curve and the second low point that followed. Measured as the reciprocal of the shortest interval obtained, with other procedures when only two points existed or no response existed.
12) and finally Respiration Line Length (Timm 1982): linear distance of each ˝ second sample from question onset to 10 seconds after onset.

Kircher and Raskin (1988) reported that areas measures were redundant with amplitude measures. They further reported that rise rates of the BV and FPA were dropped due to low reliability and validity. They did not retain HRT as it was similar to FRT, which was less correlated with SC/EDA. They report that EBF was not correlated with the criterion and was negatively correlated with SC/EDA.

Kircher and Raskin (1988) used discriminant function and likelihood function to calculate a discriminant score and conditional probability for each subject (N=100, with 50 guilty and 50 innocent subjects). They reported five variables that maximized predictive likelihood: Amplitude and FRT for SC/EDA; EBF and Amplitude for BV; and Respiration Line Length. They reported decision accuracy levels of 97% and 94% (excluding inconclusive) for the computer model and numerical scores from an expert examiner.

A cross validation study using data obtained from Rovner, Raskin, and Kircher (1979). Overall accuracy obtained with the cross validation sample was 79% and 85% for the computer and numerical scoring, with the computer yielding more INC decisions. One-half of the sample (N=48) had been given countermeasure instructions prior to testing. Interestingly, the RLL correlations for the cross validation study were lower for the computer (-.55 vs. -.27) and almost statistically significant compared with .60 and .55 for the numerical scores. Humans still do a better job handling marginal or countermeasure data.

Kircher and Raskin (1988) reported that skin conductance amplitude was “the most useful measure for discriminating between truthful and deceptive subjects,” and cited other studies that offered similar conclusions (Barland & Raskin, 1975; Bradley & Ainsworth, 1984; Bradley & Janisse, 1981; Kubis, 1973; Podlesny & Raskin, 1978; Raskin & Hare, 1978). They reported correlations of .77 and .82 for computer generated scoring. They report Blood Pressure Amplitude correlations of .61 and .66 for their standardization and cross validation studies, along with RLL correlations of -.55 and -.27 (with a cross validation group of which one-half had been instructed in countermeasures). Kircher and Raskin (1988) also report correlation values of .61 and .73 for SC/EDA data in numerical scoring procedures, along with Blood Pressure Amplitude correlations of .53 and .70 and respiratory response scores (not RLL) of .57 and .60. Kircher and Raskin report the numerical scoring correlation for vasomotor (FPA) activity as .60 and .55. They do not provide correlation score for FPA using their computer scoring methods.

A review of the Utah Numerical Scoring System, (Bell, Raskin, Honts, & Kircher, 1999) reveals this: “The amplitude of a reaction is defined as the greatest difference between any low point and subsequent high point that occurs within the scoring window...” regarding EDA reactions. This would seem different from instructions to score from the value at question onset to the maximum amplitude value within the scoring window.

Similarly, Bell, Raskin, Honts, and Kircher (1999) wrote, regarding cardio reactions “The numerical score is based primarily on the largest rise in baseline that occurs within the scoring window.” Again, this would seem different that a more concise instruction to score from the value at stimulus onset to maximum amplitude value within the scoring window.


I have other questions about the handling of RLL values in upper and lower pneumos, so be warned.


r


------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

[This message has been edited by rnelson (edited 09-18-2006).]

IP: Logged

Barry C
Member
posted 09-18-2006 11:03 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Can you post the original Kircher study? I never found a copy. I emailed him about it once (as I have similar questions), but never heard back from him. Dr. Raskin does answer emails rather quickly, but I haven't asked him about the Kircher Extract program (which I have and use).

The short answer about the .5 second issue is that Dr. Kircher had found that it takes at least that long for the brain / body to do its thing and react (in the EDA, and, I think, cardio), so reactions before that can't be due to the question / stimulus.

The 10 second RLL is an arbitrary number. You can use any (which I should qualify, but won't), and the DoDPI criteria is very strange in that regard. They use a different time period for each scoring pair of questions, and they don't necessarily start at question onset either; although, that could have changed in the 2006 Handbook.

OSS wasn't meant to be perfect. (What scoring system is?) It was meant to be objective. A few features that consistently correlate with deception were used to create Extract, and the OSS creators were stuck with what that produced for data. The end result was a scoring system designed to meet Daubert standards.

I've forgotten all your questions. I'll come back later when I have more time and try again.

IP: Logged

Barry C
Member
posted 09-18-2006 11:11 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
I should add this looks like a score of zero in the Utah system. I can't see where the first CQ EDA begins, so it's hard to say. In the DoDPI system, it might be a -1, but I don't want to count those tiny chart divisions. (I'm looking at the scoreable portion, and I'm assuming the first CQ started before .5 seconds after the question onset. From there, the "bigger is better" principle would apply, and I think the RQ wins.)

IP: Logged

rnelson
Member
posted 09-18-2006 01:23 PM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Thanks Barry,

I wasn't really asking about the .5 second issue. But I think its interesting, that Kircher didn't do that with the other channels. Its also interesting that the scoring window has been subject to variable interpretation since Kircher and Raskin (1988).

I agree that 10 seconds of pneumo appears arbitrary. It seems to me that there is often diagnostic information occuring out to 15 or 20 seconds after the question onset.

I'm scanning the Kircher study for you. I also have the Timm (1982) study, and a couple other interesting things.

It is my understanding that OSS was built on Kircher features because they were derived empirically, subject to cross validation, and are based only on repeatable measurements.

Kircher features are also the core of the Utah system.

My question was really about the selection of the starting measurement point Krapohl and McManus (1999) and Dutton (2000) describe the measurement from the stimulus onset. While Kircher and Raskin (1988) - in the study that defined the Kircher features - measured the maximum amplitude that occured afte a low point in the tracing. Think about it, and look at C5. According to Kircher and Raskin's procedure, that would be the low point just after the question end and just below the answer. However according to Krapohl and McManus (1999) and Dutton (2000) they would score from question onset to the maxmum amplitude (after the answer). That is quite a difference. The tracing descends a for about one second after question onset, and then ascends from that point. I think a lot of examiners would score this from that lowest point after onset to the highest point after answer.

R6 is a better example of tracing that descend before ascending, resulting in different measurement values that those which would be obtained if scored from the value at question onset.

According to Utah rules, I get

Pneumo +1 (slight baseline arousal during question C7, along with slowing after answer - nothing in R6) (is just a little suspicious looking, but the movement is steady and the uppper and lower pneumo remained synchronized. Plus, there is nothing noteworthy in the rest of the charts.

EDA +1 (due to the magnitude of reaction at C7 and duration and complexity - compared with the much more simple reaction at R6)

Cardio -1 (R6 shows 1 1/2 chart divisions for 3/16", compared with one chart division at C7 , or 1/4" and just over one chart division at C9) I know, the reactions at C7 and C9 occur over a longer duration, the R6 is clearly (to me) a more intense reaction.

That is my question, look at the measurement values. The Krapohl and McManus version of the Kircher measurement seems to ignore some sympathetic reaction data.

Is this correct?

I'd like to know more about the Extract program. Does it work with Limestone. I have a lot of data from a beloved Axciton system. The accuracy of the program's measurements is in part a programing complexity issue. (its a lot simpler to measure to maximum value from the stimulus onset value, than to find the lowest value as Kircher and Raskin (1988) described. (So we've arrived at another one of those black-box issues, where we need to know the secret handshake to be allowed to look inside and know how the dad-gum thing works.)

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

[This message has been edited by rnelson (edited 09-18-2006).]

IP: Logged

Poly761
Member
posted 09-18-2006 06:38 PM     Click Here to See the Profile for Poly761   Click Here to Email Poly761     Edit/Delete Message
Need username/password, new computer.

Thanks.....

IP: Logged

rnelson
Member
posted 09-18-2006 07:45 PM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Users of this forum can use

username: polyguest
password: torquemada

I recently moved the site to a new server. I thought moved seemlessly and flawlessly, but I was informed today my main email is down.

I can always be reached at raymond.nelson@gmail.com

r

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

Poly761
Member
posted 09-19-2006 12:01 AM     Click Here to See the Profile for Poly761   Click Here to Email Poly761     Edit/Delete Message
I have always scored pneumo/cardio responses beginning at the point the question is answered. EDA changes have been evaluated if they begin 5-8 seconds prior to the start of the question. Question recognition is also considered.

(6-7), pneumo. I need more chart for C7 to evaluate as there are only two breathing cycles showing after the question is answered. Are the two cycles showing the start of a 4-breath ascending cycle? I would score a +2 if this cycle is present as it is in C5.

(6-7), EDA, -1. As I've indicated above the response begins too late in R6. While the degree (sympathetic response) is not as pronounced in R7, look at the duration of this response at R7.

(6-7), Cardio, 0. Being conservative in my scoring I don't see enough change from the point of the answer to score one question more/less than the other. Absent any other indices I can't agree one-half chart division is sufficient to call. I can't see C9.

END.....

IP: Logged

Barry C
Member
posted 09-19-2006 08:30 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Ray,

quote:
It is my understanding that OSS was built on Kircher features because they were derived empirically, subject to cross validation, and are based only on repeatable measurements.

Yes, that's true, but OSS only looks at amplitude increases in the cardio and EDA and RLL in the pneumos, so it misses some diagnostic criteria, e.g., duration.

The 10 second criteria is - to the best of my recollection - a relatively arbitrary figure. When scoring manually, Don used to say pick a number out to, I think, 15 seconds. Whatever you use, be consistent though. I've toyed with a number of windows without much difference. Personally, I like about 12 seconds as I think it gets a little more diagnostic data, but that's just an intuitive opinion.

I created an Excel spreadsheet which took the Extract data in Bi-Zone tests and computed a score using the OSS computations. I would come up with cut-offs, but I asked for confirmed case data and didn't get enough to finish it. When I score manually (a long task for OSS), I score from the lowest point to the highest within the scoring window, and my hand scores have always been very similar to the computer scores.

I don't know if Extract works with Limestone. I can send it to you and you can toy with it and see. Let me know.

I don't remember the answer to your question as it's been so long since I looked at that stuff. For the most part I read the Krapohl and Dutton material, so I might not have had all the relevant info anyhow. Perhaps Don might chime in here soon.

The bottom line is it probably doesn't matter (too much) how you do it. We know there are three "validated" scoring systems: Utah, DoDPI and Backster. Utah and DoDPI perform about equally well, with Utah in the lead if a decision must be made. Backster looks at a lot of criteria that have nothing to do with deception, e.g., an "ascending staircase." However, because they look at the same criteria for each question type, they usually end up making the right call. They just do so a little less often than the others do, but when you add noise to the criteria, that's expected.

I suspect the same is true in this case. When we look at only a few data points, it might make (apparently) a big difference, but in the end, I suspect the result will be the same. With that said, I am curious to hear the answer to your question.

quote:
According to Utah rules, I get

Pneumo +1 (slight baseline arousal during question C7, along with slowing after answer - nothing in R6) (is just a little suspicious looking, but the movement is steady and the upper and lower pneumo remained synchronized. Plus, there is nothing noteworthy in the rest of the charts.

EDA +1 (due to the magnitude of reaction at C7 and duration and complexity - compared with the much more simple reaction at R6)

Cardio -1 (R6 shows 1 1/2 chart divisions for 3/16", compared with one chart division at C7 , or 1/4" and just over one chart division at C9) I know, the reactions at C7 and C9 occur over a longer duration, the R6 is clearly (to me) a more intense reaction.


Pneumos: The question is which line length is shorter. C5 has an increase in rate during the question (not a reaction). That adds a lot of line length. The slowing after the answer might have made up for it, but I don't want to look that closely. Note: OSS would only see the reaction (increased rate) that takes place before the answer. However, OSS outperforms hand scorers. Go figure.

EDA: If C5's reaction starts .5 seconds after question onset AND it is 1.5 times larger (i.e. greater in amplitude), then it's a +1 by Utah rules.

Cardio if it is 1.5:1 for the RQ, then it's a -1.

Poly761,

quote:
I have always scored pneumo/cardio responses beginning at the point the question is answered. EDA changes have been evaluated if they begin 5-8 seconds prior to the start of the question. Question recognition is also considered.

What scoring system has a scoring window that begins at that point? When you say EDA responses that occur before the start of the question do you mean on-time reactions that appear to lag behind on an analog system? You're seeing real time on the computer. All those reactions are timely, except perhaps the EDA in C5. I can't see where is starts as "1.63" is in the way.

Your question is what provokes the response. The guy's answer has little to nothing to do with it. You're ignoring some good data - and some would argue, the best data.

IP: Logged

rnelson
Member
posted 09-19-2006 10:00 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
You can see the entire chart here.
http://www.raymondnelson.us/qc/090917.html

The chart is fairly clean, though there is some suspicious respiration activity at a control in the second chart.

The subject as completed approximately 15 polygraph examinations.

Poly761 wrote

quote:
I have always scored pneumo/cardio responses beginning at the point the question is answered. EDA changes have been evaluated if they begin 5-8 seconds prior to the start of the question. Question recognition is also considered.

I'm not sure what you mean by this. Are you saying you score reactions that occur or begin before the stimulus? Or, do you mean after the start of the question.

I think Gordon (1999) teaches a system that scores pneumos after the answer, but I'm not that familiar with his system. I've been curious about it for a while, because of his ranking scheme - a common method for stabilizing the variability of some types of data. Gordon's (1999) pneumo measurement scheme deserves more investigation, as we still, in my view, lack a satisfactory was of normalizing those values. (more later). It would be very interesting to see more data regarding this system, Gordon (1999) and Gordon and Cochetti (1987) are mainly descriptive.

Are you suggesting that you would wouldn't score the Pneumo to C5, or that you need two comparisons to achieve a score? Or, perhaps your using Nate Gordon's rank order (Horizontal) scheme that doesn't evaluate individual spots.

quote:
(6-7), EDA, -1. As I've indicated above the response begins too late in R6. While the degree (sympathetic response) is not as pronounced in R7, look at the duration of this response at R7.

Also, I think you might be refering to C7, as there is no R7. If so, I have two questions/comments/responses.

First, To suggest the reaction at R6 is too late begs discussion. That reaction occurs during the question, and before the answer. If it is our objective to monitor sympathetic nervous system reactions that are correlated with deception and associated with a particular stimulus, it seems rather arbitrary to suggest that a reactions which occurs during that stimulus is not associated with that stimulus. What else could it be more associated with? It seems that some of our rule, while intended to prompt conservative judgement, are quite arbitrary, and get us thinking in a manner that may be negligent of the empirical foundations of our test.

Second, it is my understanding that amplitude of response is the primary consideration, and that duration and complexity are secondary. Which is why I see the responses at C5 and R6 to be so interesting a discussion point.

For example: Gordon (1999) describes the EDA measurement procedure as "Measurement of the electrodermal tracing is performed by multiplying the height by the base of the tracing." He does not explain the rationale for that procedure, but it seems clear and succinct. However, he does not define the "base," which could be assumed to be the EDA value at stimulus onset (or .5 sec after), or it could be assumed to be the minimum value following the stimulus onset, or it could be the low point preceeding a high point.

Gordon (1999) also described the cardio measurement procedure: "The cardiograph tracing measurement is established by drawing a 20 second straight line out from the bottom of the cardiograph tracing at the beginning of the question. A measurement, in millimeters, is made determine the height of any changes that occur in the baseline of the cardiograph above that line." This may provide some insight into determining the base for EDA measurement, but assumptions are unwise, and it may be necessary to contact Nate for clarification.

The more important point is that Gordon's (1999) procedures may not attempt to exploit all of the available sympathetic/ANS response data. Again, consider if the tracing segment descends before ascending. Do we score from the low point or the stimulus onset value.

Dutton (2000) clearly indicates that we score from the stimulus onset value, and that we assign zero if the tracing never rises above the stimulus onset value. Keep in mind that scoring procedures are based upon some assumptions about the meaning of the tracing movements - that descending cardio tracing represent parasympathetic activity (while descending EDA tracings are indicative or resporbtion, not parasympathetic activity). Increasing cardio and EDA tracing activity is indicative of sympathetic/ANS activation (through two different neurotransmitters - epinephrine and acetylcholine).

Backster, genius though he may be, is not correct about everything. To characterize descending tracings as "relief" represents a gross and innaccurate oversimplification of physiology, that misguides our interpretation of the meaning of test data. Similarly, "adrenal exhaustion" (a terms that is thankfully not used much any more) as an explaination for flattening of EDA with successive charts is wholly unsatisfactory - because there is no adrenanline in the skin or the neurons in the skin.

Now ascending tracing segments, are always assumed to be associated with sympathetic/ANS activation, and this is true for every ascending segment. Why then does Dutton (2000) not allow us to exploit data from every ascending segment. I don't know enough about the Kircher Extract program, but Kircher and Raskin measured not from stimulus onset, but from the low point. To suggest that reactions which occur after a descending segment but do not exceed the value at stimulus onset are not interpretable imposes two assumptions that require argment or explaination: 1) that such ascending segments may not be associated with sympathetic/ANS activity, and 2) that we presently understand the empirical meaning of the particular measurement at stimulus onset. Neither of those assumptions seems worth arguing.

What I'm arguing is that we consider our rules and guidelines in the context of what we know about basic physiology. To suggest that some ascending reaction segments are associated with the stimulus and some others aren't requires that we have plausible rationale for such assumptions.

quote:
(6-7), Cardio, 0. Being conservative in my scoring I don't see enough change from the point of the answer to score one question more/less than the other. Absent any other indices I can't agree one-half chart division is sufficient to call. I can't see C9.

I can understand calling this zero, with only 1/8" (1/2 chart division) of difference in response amplitude against a reaction of greater duration.

Just keep in mind that any objective or computer scoring systems will generally attempt to exploit such minor differences, as long as they are reliably measureable. OSS reduces the influence of minor differences through the use of R/C ratio thresholds for assigning +/- response values from common the 7-position scale.

It would be very interesting for people to reference the scoring systems we base our judgements on.

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

Barry C
Member
posted 09-19-2006 10:25 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
I think they were limited by what Extract can actually extract. I don't think what is does is optimal, which is what you point out. There's no help file in the program, and I never heard back from Dr. Kircher when I had questions, so I can't confirm if it is doing what they do manually when scoring (everything) by hand in the Utah system.

John Kircher is the expert on physiology and why and how we score (or should score) what we see in our charts. Does anybody want to check with him?

IP: Logged

Barry C
Member
posted 09-19-2006 10:35 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Here's an old Utah cheat-sheet I made: It's pretty close to all you need to know, but it is the cheater's version:

Specific-Issue:
The cut-offs for a specific issue test are +/- 6. The charts are scored after three charts. If a decision can be made, then the exam is complete. If inconclusive, then two more charts are run and scored. The total score (of all five charts) is the final score, and a decision of DI, NDI or INC / NO is then made.

Multiple-Issue:
The cut-offs for a multiple-issue test are +/-3 per spot after three charts. A decision of truth or deception is rendered for each question based on the spot-total of the particular question. Additionally, if the total score is +/-6 and all spot totals are either positive or negative (ignoring 0 scores), then the call for all questions is NDI or DI, respectively.

Utah ZCT Scoring System “Cheat Sheet”

7-Point Scale:
0 = no difference (or less than required ratio)
+/-1 = noticeable difference
+/-2 = strong, clear difference
+/-3 = dramatic difference AND the tracing is stable AND the stronger response is the largest on the chart for that physiological measure

NOTE: A score of +/-3 in any channel is rare.

Scoring Windows:
The response must begin after the question onset (immediately for cardio and breathing; 0.5 seconds for EDA; two to four seconds for plethysmograph) and within five seconds of the answer, unless the subject typically doesn’t react until five to eight seconds after answering. An otherwise timely reaction may be considered up to 20 seconds following the onset of the question.

Breathing:
At least two successive cycles of apnea, suppression, baseline arousal and / or slowing of rate (less heavily weighted); both channels are considered, but the final (single) score is based on either the abdominal or thoracic channel, or a composite of the two channels.

NOTE: Scores of 0 and +/-1 are most common; other scores are rare.

EDA:
Amplitude (2:1 = +/- 1; 3:1 = +/- 2; 4:1 = +/- 3)
Duration and complexity are considered (A clearly longer duration or greater complexity may increase the score from 0 to 1 or 1 to 2, but the amplitudes must be at least 1.5:1 and 2.5:1, respectively.)

NOTE: The EDA channel is considered unstable when many non-specific responses are observed throughout the chart.

Cardiovascular:
Amplitude (1.5:1 minimum) Duration and complexity are considered (A clearly longer duration or greater complexity may increase the score from 0 to 1 or 1 to 2.)

NOTE: Scores of 0 and +/-1 are most common.

Finger Plethysmograph:
Amplitude reduction and / or Duration (no minimum required, but +/-2 maximum score allowed)

NOTE: a score of 1 or 2 may be assigned when duration is clearly longer even if there is little or no difference in amplitude reduction of the questions being compared.

Artifacts:

Any artifact may render a channel or an entire question unscorable. If a comparison question is not useable for scoring, then use the strongest, closest-in-time comparison question. Additionally, follow these guidelines for analyzing questions that include deep breaths or movements:

Deep Breaths:

If the examinee takes a deep breath just before question onset, then breathing should not be scored.

If a deep breath affects other channels, then those channels might be used for scoring: If the other channel’s reaction started before the deep breath, then the portion (of the other channel’s reaction) occurring before the deep breath may be used for scoring if that portion is larger than the reaction to which it is being compared. If the portion is smaller and is a comparison question, then another comparison question may be used.
If there are deep breaths elsewhere in the charts, especially where no questions were asked, and those deep breaths resulted in similar physiological changes (as the deep breath in question), then the reaction following the deep breath should not be scored. If there is no reaction following the deep breath, score very conservatively.

Movements:

If a movement distorts more than two successive (cardio) pulses after question onset, then the changes occurring after the movement should not be scored. The reaction preceding the movement artifact, if any, may be used for scoring purposes if the reaction is larger than the cardio reaction to which it is being compared. If only one or two pulses are distorted, then estimate what the reaction would have looked like had the movement not occurred, if possible.

References:

Bell, B., Raskin, D., Honts, C., & Kircher, J. (1999). The Utah Numerical Scoring System. Polygraph, 28(1), 1-9.

Raskin, D., & Honts, C. (2002). The Comparison Question Test. In Handbook of Polygraph Testing (pp. 1-48). San Diego, CA: Academic Press.

The "edit" was an update of the latest cheat-sheet I have. It is the same info that will likely appear to the updated AAPP Handbook, which has a target date of next January.

[This message has been edited by Barry C (edited 09-20-2006).]

IP: Logged

ebvan
Member
posted 09-19-2006 11:00 AM     Click Here to See the Profile for ebvan   Click Here to Email ebvan     Edit/Delete Message
From my point of view.
In this case
If we can agree that ascending tracing segments are always assumed to be associated with sympathetic/ANS activation, then ascending tracing segments should be measured from the post stimulus point they begin to ascend, unless we can tie them to some pre-stimulus activity. If the tracings are tied to pre stimulus activity then they are artifacted and should be discarded. If they are not, it makes absolutely no sense to discard the additional tracing amplitude that we lose by deferring to the tracing where it crosses the point of stimulus onset.
I would like to hear an explanation as to why someone thinks we should discard this data.

This chart makes an interesting argument that I think we should resolve because when this happens it isn't always as close as it is on this chart,but over the course of a full examination if the 1/2 chart division we are talking about on this one question is all that I have to determine my final opinion, the examinee gets the benefit of the doubt.

One of my mentors said that spending too much time trying to tell the difference between pepper an fly crap could make you go blind, crazy, or both.

ebvan 8 days 5 hours and counting

IP: Logged

rnelson
Member
posted 09-19-2006 12:02 PM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
quote:
I created an Excel spreadsheet which took the Extract data in Bi-Zone tests and computed a score using the OSS computations. I would come up with cut-offs, but I asked for confirmed case data and didn't get enough to finish it. When I score manually (a long task for OSS), I score from the lowest point to the highest within the scoring window, and my hand scores have always been very similar to the computer scores.

I don't know if Extract works with Limestone. I can send it to you and you can toy with it and see. Let me know.


Thanks Barry.

That is very interesting. I'd like to try the extract program with my Axciton data, just to see for myself how the measurements work.

(I'm starting to sympathize with my parents, who had to watch me take everything apart as a kid.)

If you are correct, both you and the Extract program are using a different measurement procedure than that specified by Dutton (2000). I think the physiological and empirical rationale support your procedure, as it is most likely to exploint all available sympathetic/ANS response data.

I hadn't considered that Poly761 might be refering to scoring procedures for an ink-slinger (anolog) instrument. If so, 5-7 before question onset is the question onset for the EDA tracing. It just needs to be explained clearly.

I'll offer to trade you spreadsheets. I could take your BiZone Extract values and calculate the significance of that difference at a specified alpha threshold - or better yet a P-value (using a t-test for small samples) that tells you the lowest level of alpha at which you could consider that difference statistically significant. Its still ipstative, not normative, but it does bring us closer to Daubert requirments for estimating the likelihood of an erroneous test results (or accidental conclusion that the relevant and comparison scores are different). It can parse the significance of individual questions - now up to five RQs with that blasted screening stuff.

I've worked out a procedural algorythm that can test the significance of hand-scored results from three or seven position scales, using t-test of rank ordered values.

On the pneumo line length issue, just look at the measurements on the graphic

here
http://www.raymondnelson.us/qc/BP_EDA_question.jpg

not here
http://www.raymondnelson.us/qc/090917.html

Although, there is another niggling issue...

If people are intent on using RLL measurements. It is very important to use ratios, not measurement or differences values.

Look at this
http://www.raymondnelson.us/qc/060917_example1.jpg

and now this...
http://www.raymondnelson.us/qc/060917_example2.jpg

It is very important for people who want to use RLL to appreiciate that the tracing amplitude setting can alter measurement values, and that it is not acceptable to compare measurements across or between the upper and lower pneumos. Similarly, the measured difference is meaningless, as that is also affect by amplitude

Compare R6 and C3.

In example 1 the upper and lower pneumo measurements show a difference of 15.78mm. The lower pneumo measures differ at 36.24mm

In example 2 the upper and lower measurements differ at 25.26 and 23.62 - after a couple of amplitude adjustments.

The point is that you must use ratios. Ratios for upper and lower pneumo in example 1 are .91 and .86. For example 2 they are .91 and .87 - I assume the variability occurs because of the complexity of numerous transformations as the software measuring tool calculates and plots the length of the hypoteneus for every single data point, then translates that length to the screen display dimensions including pixel density and linear dimensions. It would display different measurements on a smaller or larger screen, and presumably when printed to paper. (I'll try that later.)

Ratios have the effect of normalizing the data values, and algebraically cancelling out the units, making a nice elegant way of comparing values without getting into apples and oranges with measurements.


and finally...

quote:
I think they were limited by what Extract can actually extract.

That is absurd. These are computers.

I've already created an Excell spreadsheet to and locate and measure all these features from the Limestone test data (I don't attempt to translate the measurements to screen display). Limestone has a very nice open approach to their data. I can't do it automatically, and have to locate each question and parameter in the data file - but its quite possible. But its not that hard to search for minimum and maximum values and changes in slope.

There are people far more propeller-headed than I working at Johns Hopkins where they created the Extract program. They could do whatever they are asked.

And thanks for the Utah Cheat Sheet. I could easily .pdf it (with your name and contact info and a citation to Bell, Raskin, Honts and Kircher, 1999), and make it available for download and printing.


r

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

Barry C
Member
posted 09-19-2006 12:45 PM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
When I say "they," I mean Dutton and Krapohl since they only used the program - they didn't create it. I'm in the same boat. I can't make it do anything else. It does have a few neat features, but no "Help" file.

As for the cheat-sheet, it needs a little work. That was an old draft. I could manke a few changes and get it back out to you in Word or PDF. I might even have a good one on this computer somewhere.

I'll send Extract later.

IP: Logged

Barry C
Member
posted 09-20-2006 07:44 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Ray,

I sent it, but I got back some type of error message. (RCPT TO:553 sorry, that domain isn't in my list of allowed rcpthosts (#5.7.1))

It's 1.2 meg. Is it too big? I changed the file extension from "exe" to "bak" to avoid having it blocked on either end of things. I'll try again, but I might have to mail you CD. Suggestions?

IP: Logged

rnelson
Member
posted 09-20-2006 08:06 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Barry,

1.2 megs isn't too big.

I moved my website to a new server this week.

I thought it had gone flawlessly, and I don't think I or anyone even noticed. However, the only glitch seems to be that my email address is down.

I'll work on it later today when I have a spare moment.

You can alway use the google address

raymond.nelson@gmail.com

My faxes go to that address and its fairly reliable.

I have a couple of articles to send you, but after scanning they are about 32megs each - which is too large for my SMTP server. I'll have to get some other utility to convert them or rescan them to a smaller format.

My apologies.


r

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

Barry C
Member
posted 09-20-2006 09:17 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
I re-posted the latest "cheat-sheet" above, and I sent a Word copy out to you Ray. It has all the bold and italics, etc., to make reading a little more user friendly. I had originally shrunk it down and kept it handy when I couldn't rememeber all the rules. If people want to do the same, have at it.

IP: Logged

Barry C
Member
posted 09-20-2006 09:18 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Oh yeah, Ray: you should have received the emails at the g-mail address.

IP: Logged

rnelson
Member
posted 09-20-2006 10:00 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Barry, I got them. Thank you.

I .pdf'd the cheat and loaded it here.
http://www.raymondnelson.us/qc/Utah_ZCT_Scoring_System.pdf

Its two pages (quick, 100k download) - nice.

I'll also start another shorter thread for the Utah system and link. I think its very important, so thanks for making it available and accessable.

Extract doesn't like the Limestone charts. Axciton is fine. I don't yet know about Lafayette. I have plenty of data from the Axciton syste, So I'll have fun figuring out what the scored values mean.

I'm inclined to start with the assumption that the units of measurement are inches (as displayed where I don't know).

I'll send you a spreadsheet for significance testing as soon as I figure out what the vertical column measurements actually represent.

If you have any info on the vertical column measurement values that would help. You indicated you achieve hand measurements that are similar.

r

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

Barry C
Member
posted 09-20-2006 10:15 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
It works with Axciton and Lafayette. I imagine it'll work with CPS as that's the system the Utah group created. Limestone I didn't expect it would touch.

As far as the columns go, I only use the last four.

TR = upper pneumo (thor. resp.)
AR = lower pneumo (abdom. resp.)
SC = EDA
BP = cardio

What the measurements are, I don't know, but as you pointed out, they don't matter as they are used to compute ratios. Scores are assigned based on the value of those ratios.

Have fun.

IP: Logged

Poly761
Member
posted 09-20-2006 11:32 PM     Click Here to See the Profile for Poly761   Click Here to Email Poly761     Edit/Delete Message
Barry/RNelson

Regarding my thread on 9-19 at 12:01AM, I didn't explain my EDA evaluation adequately. I begin evaluating EDA responses 5-8 (can go to 12) seconds prior to the time my chart marking identifies the (start) of a question. This time-frame is used as the EDA pen is approximately 1/2" longer than the pneumo/cardio pens.

They are "on-time reactions that appear to lag behind on an analog system." I agree the question produces the EDA response. Responses are only evaluated within the time-frame that is established by the start of a question.

Barry, you state an "ascending staircase" has nothing to do with deception. What is the explanation for this type of response? What of the descending staircase? I don't understand your following statement:

"Backster looks at a lot of criteria that have nothing to do with deception, e.g., an "ascending staircase." However, because they look at the same criteria for each question type, they usually end up making the right call.

Are you referring to indices of deception as the "same criteria" being looked at for each question type?

Rnelson - Not being familiar with computerized instruments I stated R6 (EDA) was late based on the criteria I use on an analong instrument as explained above. I don't look to "exploit" any minor differences in scoring. I'm not placing a negative on this term but cardio 6-7 is too close for me. What was the computer score for cardio 6-7?

What is OSS, SC and FRT?

END.....

IP: Logged

rnelson
Member
posted 09-21-2006 01:11 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
OK, its really late, not quite 12:01 AM, but late enough I need ot be careful here.

Poly761,

I think you are correct that exploiting minor differences is not wise. Conservative judgment says - NFW (some version of don't do it).

OSS is the Objective Scoring System - based on empirically validated criteria that are reliably and mechanically measurable. Its a really good system, the only downsides are the time involved in scoring, and its applicability to single-issue, three-question ZCT formats.

SC is skin conductance, most instruments seem to measure conductance (or a hybrid of conductance and resistance) as a more stable measurement than resistance, making the term GSR somewhat arcane.

FRT - I'm not sure, maybe shorthand chart marking (missing vowel) for when your examinee is abusing your motion sensor.

Barry can answer for himself, but some of the major objectives of measurement systems - and numerical scoring has intended to be a measurement system - including the interpretation of ratios of difference in response magnitude - are: 1) that they be reliable (replicatable), meaning that different evaluators using the same rules achieve the same measured answer, and 2) that those measured answeres be mechanically based, rather than intuitively based. To that end, most, if not all, computer and empirically derived polygraph scoring (and measurement) systems have concerned themselves only with measurable features.

Things like staircases, as described by Backster and the now arcane USAMPS rules have attempted to interpret data features that are very difficult, if not impossible for which to obtain reliable measurement - things such as staircases. Think about it, how would you measure that. You could measure the time-period of staircase rise, but you'd have to specify what magnitude (measurement) of increasing amplitude constitutes a staircase, and how many cycles, how much variability, etc. Or, you could specify the rate of rise in the stair case, but that might be influenced by other variables like the sensitivity of the component, or the gain/amplitude setting of the instrument/software.

In actuality, both Backster and USAMPS systems have sometimes attempted to interpret data features that are qualitative signatures (the shape of the tracings), rather than quantitative values magnitudes of response. It is a perfectly understandable place to start (40 years ago) the business of numerically scored polygraph testing. However, it may not be the place we want to finish our work, when considering a legal and clinical environment that emphasizes reliably measurable data that can be evaluated for statistical significance. That is the reason that Utah, the "defensible dozen" that is apparently currently being taught at DodPI, and empirically derived systems, do not emphasize the signature or shape quality of the data, except to the degree that changes in data quality are both theoretically and empirically associated with measurement changes that have been empirically (statistically) shown to be correlated (mathematically) with deception.

Things like "supression", "baseline arousal", and "slowing" are empirically associated with shorter RLL measurements, which have been repeatedly correlated with deception in published studies. Things like "staircases" and "changes in I/E ratio" are extraordinarily complex, and while they may be diagnostic, or correlated with deception, the difficulty they present towards mechanical measurements (reliability) means they are impressionistic (non-measureable) or signature (shape) features.

I have often said that if I had only one test that I could ever use - it would be the Rorschach (ink-blot) test. (I'm exposing my psychodynamic roots here.) Rorschach is the single most rich source of information ever devised in a testing format. Its also not very reliable (interrater agreement). But it gets you a ton of really great clinical information, that is hugely diagnostic. Remember, the test doesn't make the diagnosis - the professional does - the test simply gives information. The Rorschach Test, as codified by Exner, is hugely complex, and takes much longer to learn than the polygraph.

Like the Rorschach, pneumograph tracings may actually contain the most "rich" variety of information. However, "richness" of information also measure lower rates of interrater agreement as different people are more likely to look at different things. Well established and clearly defined rules help, but are only a partial solution towards stabilizing the role of human/evaluator variability in complex evaluation systems. (Exner achieved good reliability with the Rorschach test only through mechanical interpretation rules that were empirically derived.)

Although RLL measurements themselves are not the most diagnostic criteria, they are the most reliable measurement of pneumographic activity - and it is axiomatic that reliability defines the upper limit of validity - measurements cannot be any more valid than they are reliable (interrater, in the case of polygraph).

Complexity is axiomaticallly associated with lower rates of reliablity (interrater agreement), among human evaluators. Low rates of reliability prevent the extablishment of adequate or satisfactory validty. Complexity is theoretically irrelevant with computer evaluation, though it is interesting that computer derived scoring systems seem to have emphasized simpler (not more complex) data features than human scoring systems.

Now, signature features have their place in forensic science, just think of hand-writing analysis and document analysis - both valid forms of investigation - they require data (norms) to support their conclusions. However, research derived norms regarding polygraph tracing signature shapes have never been published in any scientific journal (one standard for acceptable science). Measureable criteria regarding polygraph reactions that are correlated with deception have been published in scientific journals, and that is why those features are called "defensible." The beauty of all this is threefold: 1) that those "measureable" criteria are actually simpler, 2) that those measureable criteria actually produce results that are as accurate or more accurate than more complex systems, and 3) their simplicity leads to better reliablity (interrrater agreement) which will lead ultimately to better acceptance among the scientific community.

Alright, enough of this. Its late, and I'm ranting. I've got sleeping to do.

Poly761, thanks for clarifying the ink-slinger/analog thing - some instruments are 7 seconds offset, others are 5 seconds.

Niters.

r

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

Poly761
Member
posted 09-21-2006 11:16 AM     Click Here to See the Profile for Poly761   Click Here to Email Poly761     Edit/Delete Message
RNelson -

Again read your post and identified HRT/FRT:

7) Half Recovery Rate: half of Amplitude divided by Half Recovery Time
8) Full Recovery Rate: Amplitude divided by Full Recovery Time

How is recovery rate and recovery time determined?

END.....

[This message has been edited by Poly761 (edited 09-21-2006).]

[This message has been edited by Poly761 (edited 09-21-2006).]

IP: Logged

Barry C
Member
posted 09-21-2006 04:53 PM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Ray covered it pretty thoroughly, but to cite examples, consider the cardio tracing: increased rate, decreased rate, increase pulse amplitude, decreased pulse amplitude, change in position of the dicrotic notch, PVCs, etc. Some say pretty much anything you see as a deviation from baseline is a reaction. (DoDPI's scoring former system said pretty much the same thing.) We know now, however, that is not the case. Any of those could be an indicator, but statistically, they are pretty meaningless. The up side is, though, that if you consider all those things consistently (for RQs nd CQs), you'll usually make a correct call, but you won't do so as often as the Utah or current DoDPI systems.

I don't want to make this sound like Backster bashing as that's not the point. I still read his scoring rules because I think he has a lot to offer. It was Backster who said to look at cardio trends in the absence of a phasic response, and he was right. OSS does that, and it outperforms traditional hand scorers. Many people ignore them.

IP: Logged

rnelson
Member
posted 09-21-2006 10:44 PM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
quote:
7) Half Recovery Rate: half of Amplitude divided by Half Recovery Time
8) Full Recovery Rate: Amplitude divided by Full Recovery Time

How is recovery rate and recovery time determined?


Good question. I knew someone would wonder about this - I tried to summarized it above, but its buried in the veritable volume of very vigorous verbage and verbosity.

paraphrasing Kircher and Raskin (1988) in my initial post

quote:
After identifying low points, Kircher and Raskin (1988) described that they then identified the low points of the SC/EDA response curve, after which they apparently isolated the high points of each segment between low points and then isolated the exact times and levels of the low points.

Kircher and Raskin (1988) reported that all of the times and levels of the high and low points of tracing measurement provided all the information necessary to quantify all of the physiological features which they investigated, with the exception of the RLL measurement (Timm, 1982).


This say a whole lot. By tesing each tracing segment for slope (tracing going up = positive slope, tracing going down = negative slope), they located the amplitude (height, or Y-axis) measurement of each high and low point and determined the location of those points on the time-scale (X-axis). They aparently did not use every data point along the tracing - which means they did not attempt to measure or use information from the complexity or shape of the tracings in their data analysis.

This is what they say about how recovery rate and time are determined.

quote:
Half Recovery Time. time of occurrance of the maximum amplitude was subtracted from the time at which the recovery limb was half the obtained amplitude. When the response did not recover sufficiently to reach the criterion, the interval was measured to the end of the 20-s sampling period.

Full Recovery Time. Full recovery time was obtained in the same manner as the half recovery time except the endpoint was the time at which the response fully recovered to the baseline level.


There is not much more to it than that, except HRT was closely correlated with FRT and SC. FRT is essentially what we call "duration and complexity." The complexity, while interesting, presents measurement reliability problems, so the time to recovery seems to be a reliable solution.

It is important to keep in mind that recovery in SC and BP represent two different physiological mechanism: resporbtion and parasympathetic recovery. The general term "relief" seems to be taught in polygraph schools, and you hear things like "you can't react and relieve at the same time." Well, mabye... that might be an oversimplification of physiology. In actuality, sympathetic activity, parasympathetic activity, and resporbtion are always occurring all the time - simultaneously. So, once again it is important to keep in mind when we are speaking metaphorically, and to be careful about taking our metaphors and assumptions too literally.

I'm still not completely sure about the answer to my initial question, but Barry indicated he measures from minimum to maximum values. That essentially what I was taught regarding the Utah system. I'm curious what others do.

Timm (1982) described the RLL tracing and the idea of placing RLL measurements in rank order (which begins to solve the problem of normalizing those values). Timm (1982) also described his EDA (what he called SRR) measurements, and wrote "one of the procedures to score the electrodermal measurements was to measure the vertical rise of the largest wave occuring from the onset of the stimulus question until 15 seconds had transpired." Timm (1982) further wrote "electrodermal patterns were also scored by measuring the highest point they reached during the 15 second interval. This was accomplished by measuring the height in millimeters of a vertical line drawn from the highest point reached by the pen (during each time interval) to the bottom of the chart paper." Timm aparently ranked those measured EDA values, and used an electrodermal scoring system proposed by Lykken (1959). (I'll have to look for those Lykken 1959 and 1960 articles when I get out of court tommorrow.) Timm does not seem to discuss how he measured cardio values.

r

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

Poly761
Member
posted 09-22-2006 01:45 AM     Click Here to See the Profile for Poly761   Click Here to Email Poly761     Edit/Delete Message
Barry -

You stated an "ascending staircase" has "nothing to do with deception." So we're on the same page I'm relating this to respiration. In my experience I have only observed this suppressed pattern in a CQ or RQ; not between questions or in a calibration test.

What are any of the indices we use if not simply "indicators" (of deception) as you state in another of your threads? I don't see how they can statistically be "meaningless;" and, at the same time if we consistently use them we "usually make a correct call."

In reading your thread it appears the Utah/DoDPI systems are using different indices than those you cited, 9-21 @ 4:53. Is this correct?

END.....

IP: Logged

Barry C
Member
posted 09-22-2006 06:01 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
If you've got a pneumo that gets bigger (ascending staircase pattern) - rather than smaller (suppression) - then you aren't looking at a reaction that correlates with deception.

quote:
What are any of the indices we use if not simply "indicators" (of deception) as you state in another of your threads? I don't see how they can statistically be "meaningless;" and, at the same time if we consistently use them we "usually make a correct call."

Let's say we score amplitude decreases in the cardio. (They don't correlate with deception.) Because we consider them in both the RQs and CQs, those non-reactions tend to balance out, because they appear rather equally in both types of questions. They are statistically meaningless because, as I said, they don't correlate with deception. In other words, they better our chances of doing what we are supposed to be doing. (They do, however, reduce our accurate calls, but usually not enough to get us to a wrong decision - but they could.

quote:
In reading your thread it appears the Utah/DoDPI systems are using different indices than those you cited, 9-21 @ 4:53. Is this correct?

I don't know what you mean. DoDPI changed its system by removing the meaningless criteria (few used anyhow), and now they use almost all the same features Utah does. DoDPI doesn't yet use the PLE, so they don't have any criteria for that, but it's in the works. How they assign values to those features is different though. Does that answer your question?

IP: Logged

rnelson
Member
posted 09-22-2006 08:45 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
I think its important to keep in mind that the term "meaningless" is vague in this useage. It could be "not associated with the criterion," but not necessarily. Things we haven't discovered yet, which may be quite valid, are currently "meaningless." Things can be strongly associated with our criterion, but if we lack the technology or methodology to measure them reliable, they are also "meaningless." Additionally, "meaningless," could mean "correlated, but not at statistically significant levels." "Meaningless" generally means "not significant" or "not statistically significant." When we use the term "significant" in science, we are generally assumed to be refering to statistical significance - to do otherwise is to cause confusion not clarity. When refering to some form of significance other than statistical significance, it is generally preferable to use a term that is less formally stipulated in its meaning - something like "substantial." It is also important to keep in mind that "not significant" is not the same as "meaningless." Thinks can be not significant due to random measurement error (always present), sample problems, research design flaws, or vagueness in construct validity. It is possible that some phenomena are "meaningless" in some studies and "significant" in others. Its science and mathematics - kind of like herding cats.

Here is the ASTM section on scoring criteria. It looks interestingly like the Utah criteria.

--------------

4.2 Numerical Evaluation:
4.2.1 Evaluators employing numerical evaluation shall first verify that the PDD recordings are suitable for evaluation. If they are not suitable, no evaluation shall be undertaken for the purpose of diagnosing truthfulness or deception.
4.2.1.1 Nothing shall preclude an evaluator from reporting evidence of countermeasures when this evidence exists.
4.2.2 There are four principal components to numerical evaluation. They are:
4.2.2.1 Identification of diagnostic tracing features.
4.2.2.2 Assignment of numerical values according to the relative intensity of the tracing features.
4.2.2.3 Computations based on the numerical values.
4.2.2.4 Decision rules that result from the computations.
4.2.3 While others may occur in individual cases, there are five empirically established diagnostic features in the respiration channel. They are:
4.2.3.1 Suppression of respiration amplitude.
4.2.3.2 Slowing of breathing rate (increase in cycle time, or bradypnea).
4.2.3.3 Change in the inhalation/exhalation time ratio.
4.2.3.4 Apnea.
4.2.3.5 Rise in the baseline of the respiration cycles. All of the diagnostic features in respiration, except the rise in baseline, are captured by a common metric, respiration line length.
4.2.4 There is one primary diagnostic feature in the electrodermal channel that has been empirically confirmed. It is electrodermal response amplitude.
4.2.4.1 There are two secondary diagnostic features:
(1) Response complexity.
(2) Response duration.
4.2.5 While others may occur in individual cases, there is one primary diagnostic feature in the cardiograph channel that has been empirically verified. It is the rise in the cardiograph tracing baseline.
4.2.5.1 There is one secondary feature: response duration.
4.2.6 There is one diagnostic feature in the photoplethysmograph that has been empirically determined. It is the decrease in pulse amplitude.

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

Barry C
Member
posted 09-22-2006 09:49 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
I thought of that after I wrote it Ray. Good point. I fell into the trap of redefining terms. When I said "meaningless" I meant it is of no use to us as examiners from a practical perspective. The data on the non-scorable criteria was statistically meaningful in that those criteria appear to happen by chance alone - not deception, but we must remember that in science, everything is tentative as more data tomorrow can change the wisdom of today.

The ASTM criteria were based on Don Krapohl's best practices series in which he discussed features that are actually supported by research. The Utah folks recognized that a long time ago, and that is how they developed their system. They also did some more supporting research, which Don cited in his publication as well (I think). It was that work, among other things, that started the chain reaction of changes at DoDPI.

Does anybody recall the study that Norm Ansley did back a ways on what features appeared most often in charts? He looked at a lot of data, and came to the same conclusions on what's actually there. It was published in POLYGRAPH, and it's an interesting read.

IP: Logged

rnelson
Member
posted 09-22-2006 01:29 PM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Information can also be "meaningless" if it is redundant with other information, which is what Kircher and Raskin (1988) found with HRT and FRT values.

Its probably about time to put this topic to rest, but here is another screenshot.
http://www.raymondnelson.us/qc/060917_again.jpg

Look at the measurements at C5 R6 and C7, now with slight increasess in sensitivity. The difference is now over one chart division. The measurements concern me, though they are correct according to the procedures described by Dutton (2000) R6 is clearly the greater reaction.

The point of this illustration is that the measurement values themselves are almost meaningless - they are variable with sensitivity. Also, you could take and data value and transform it mathematically using squares, square roots, logs, or antilogs, and increase or decrease the variability and normalcy. Its a legitimate thing to do, and its done all the time in research, when sifting through data does not initially make enough sense. Think about it, our Cardio and EDA measurement values are not actually measurements of Electrodermal or cardio activity, but of instrument activity. The measurements we take have very little to do with the actual physiology involved in EDA, breathing, and cardiovascular activity. Think about it: pressure is measured in mm/hg - we don't ever record those values, and the verical (y-axis) amplitude change which we measure is uninformative regarding actual blood pressure or blood volume. Similarly, we don't record electrodermal conductance or resistance values, and the height of arousal of the tracing is electrically meaningless. Respiration would have be measured volumetrically. My sadistic side says it might be fun to force the examinee to wear an face mask, and who wants to decrease examinee discomfort with some silly finger cuff - I want them to know they are taking a test.

So, our measurements are metaphors for physiological activity - measureable metaphors, but still metaphors - like the ancient story from India about the blind men who describe an elephant variously as a tree or rope, or Plato's myth of the cave.

The empirically responsible way of appreciating that our measurements are metaphors, is to remain thoughtful that the measurment units themselve are uninformative. The goal then is to use those measurements in a normalized manner, without emphasis on the units themselves. Ratio scores (as in OSS), and proportion values (as in Miritello, 1999), and rank values (as described by Lykken as early as 1959) are common ways of normalizing polygraph measurement data so as to avoid the empirical mistake of attempting to interpret the units themselves as if they are meaningful.

This is currently an important issue of vulnerability with professionals doing PPG testing (penile plethysmograph) who talk about about increases of "3 points" or "10 points" but cannot answer questions about what those "points" represent.

r

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

Barry C
Member
posted 09-22-2006 01:43 PM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
I'm going back and forth with Dr. Raskin, and he said what we already knew: you measure the reaction from low point to high point anywhere within the scoring window. I'm still waiting to hear if that's what Extract does. He was going to see if Dr. Kircher had anything to add, but he didn't seem to know what Extract was off the top of his head anyhow. It may be that it only extracts the "Kircher features," but how that's done may not be the way Kircher would do it. He did say OSS and the CPS / Utah algorithms do tend to be at odds sometimes, and you might have discovered the reason. Now I'm going to have to read those studies!

IP: Logged

ebvan
Member
posted 09-22-2006 03:07 PM     Click Here to See the Profile for ebvan   Click Here to Email ebvan     Edit/Delete Message
Measureable metaphors. I have never heard polygraph tracings more concisely or accurately described.

I Like that

IP: Logged

rnelson
Member
posted 09-22-2006 09:24 PM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
Thanks Barry,

I think it helps to have an authoritative reference on these concerns. How did you contact Dr. Raskin?

I'll spend some time this weekend hand-measuring some confirmed cases that I got from Axciton a while back. Then I'll run them through Extract. I could do a correlation or ANOVA, but the N is small, so maybe Pearsons. Actually, the math may not be necessary - we'll see.

All of this requires spare time, and I have important things to do, like dishes, and rescueing my again broken-down Subaru.

r


------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

Barry C
Member
posted 09-22-2006 09:32 PM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
He contacted me via email about something else, so I figured it was a sign and switched gears a little bit.

IP: Logged

J.B. McCloughan
Administrator
posted 09-22-2006 10:15 PM     Click Here to See the Profile for J.B. McCloughan   Click Here to Email J.B. McCloughan     Edit/Delete Message
Ray,

Is this a confirmed chart?

The EDA responses to the comparison questions are somewhat atypical and there still appears to be a significant response to the relevant (EDA and Cardio). Also, the cardio response in the relevant seems to correlate with the EDA but not so in the comparisons.

[This message has been edited by J.B. McCloughan (edited 09-22-2006).]

IP: Logged

Poly761
Member
posted 09-22-2006 10:28 PM     Click Here to See the Profile for Poly761   Click Here to Email Poly761     Edit/Delete Message
How did the pneumos score, C5-R6?

END.....

IP: Logged

Barry C
Member
posted 09-23-2006 10:40 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
What do you mean by how'd they score? If you mean by looking at line length, go back to the chart and look at the measurements:
http://www.raymondnelson.us/qc/060917_again.jpg

You'll see R6 has shorter line length in both the upper and lower pneumos, which means the R6's reaction is slightly bigger than C5; however, if you did the math (remember ratios), you'd find the ratios aren't significant enough, and you'd get a zero.

Don't forget that you score (in Utah and DoDPI) to the largest CQ on either side of the RQ when an RQ is bracketed as R6 is. You'll notice C7 has the bigger reactions (shorter line length = more suppression), but again, the ratios would land you a zero as the difference is minimal (and not apparent to the naked eye as DoDPI requires).

I don't have the OSS ratios in front of me (and Identifi's and Polyscore's are still hidden in the "black box," so who knows what they'd do with the data).

Is that what you were asking?

IP: Logged

Poly761
Member
posted 09-25-2006 10:03 AM     Click Here to See the Profile for Poly761   Click Here to Email Poly761     Edit/Delete Message
I'd like to learn what the computer score was for the pneumos, C5 to R6.

Define "line length."

What I see in C5 is a change in the rhythm & regularity of the breathing cycle that begins with the question being asked. This appears to be the "average" or "norm" as it is present before C5 and after R6.

Also, a small two cycle volume change just before the question is answered. I don't consider these changes significantly different than those in R6.

I see a considerable difference beginning when C5 is answered, a four-cycle supression ("staircase"). An average pattern is continued after C6 is answered, no distortion.

At this point and for the pneumos at C5-R6 (only), I would score at least +2. I can only see two cycles of breathing for R7 from the point of the answer.

END.....

IP: Logged

Barry C
Member
posted 09-25-2006 10:50 AM     Click Here to See the Profile for Barry C   Click Here to Email Barry C     Edit/Delete Message
Each scoring algorithm is going to score them differently, so there really is no single "computer score."

As far as the "staircase" goes, you really only have a two-cycle suppression (unless you consider what appears to ba an answer distortion as a suppression). The second two are at or above prestimulus level, which means normal breathing or relief. That's the problem with just looking at patterns.

Line length means you picture the pneumo tracings flattened out until they are straight, making a straight line. The shortest line, is the greatest reaction. You can measure them too, but that requires a computer or a special measuring device. It captures all reactions except a change in baseline.

How far out you decide to go could determine the score in this one. If you look at 10 seconds worth of line-length, then you've got a zero; however, the baseline increase would give you a +1 - something the computer won't see. If you go farther out - as DoDPI now would, you could argue a +2. Remember, in the pneumos, scores of more than 1 are rare in the research literature supporting the "validated" systems. A 3 is almost unheard of.

I probably wouldn't give it a two, but if you're consistent with RQs and CQs, it usually won't matter in the end, which I've addressed before.

IP: Logged

rnelson
Member
posted 09-25-2006 11:30 AM     Click Here to See the Profile for rnelson   Click Here to Email rnelson     Edit/Delete Message
I'm tempted to score c5 to R6 too - I rarely go more that 1 on a pneumo.

J.B,

This is not a confirmed case. This is a maintenance polygraph on a guy who has completed treatment (one of perhaps two persons in Colorado) yet remains on probation. He's a former High School principal who was well known and well respected in his community prior to the assault - the community was quite impacted. He's had about 15 or more prior polygraphs. This one does'nt look as good as some of the others.

Barry,

Kircher Extract scores from the lowest point to the hightest point, even if the tracing descends at quesion onset, and even if it subsequently descends again before reacting to an even higher point withint he scoring window. I'll post some images and explainations. It does not seem to follow the rules specified by Dutton (2000) the scoring windows are shorter, and EDA (at leaste) reactions (minimum value or low point) that begin as late as 8 seconds after stimulus onset are not used. Similarly, cardio reactions that descend before ascending are scored from the low point, not the question onset value - cardio measurements have much poorer correlations with hand scored measurements.

It is important to keep in mind (I'll find the cite again) that RLL values are correlated, but more weakly than others, with deception, and have not always outperformed hand-scored pneumos (however they are a lot more reliable - so they will work better in the long term.)

r

------------------
"Gentlemen, you can't fight in here, this is the war room."
--(from Dr. Strangelove, 1964)

IP: Logged

This topic is 2 pages long:   1  2 

All times are PT (US)

next newest topic | next oldest topic

Administrative Options: Close Topic | Archive/Move | Delete Topic
Post New Topic  Post A Reply
Hop to:

Contact Us | The Polygraph Place

copyright 1999-2003. WordNet Solutions. All Rights Reserved

Powered by: Ultimate Bulletin Board, Version 5.39c
© Infopop Corporation (formerly Madrona Park, Inc.), 1998 - 1999.